
                                 FLEX 
                                ======
                     
     NAME:
          flex - fast lexical analyzer generator

 SYNOPSIS:
          flex -dfirstvFILT -c[efmF] -Sskeleton_file filename

DESCRIPTION
===========
flex is a rewrite of lex intended to right some of that tool's deficiencies:
in particular, flex generates lexical analyzers much faster, and the analyzers
use smaller tables and run faster.

OPTIONS
=======
In addition to lex's -t flag, flex has the following options:
-d
makes the generated scanner run in debug mode.  Whenever a pattern is 
recognized the scanner will write to stderr a line of the form:

    --accepting rule #n

Rules are numbered sequentially with the first one being 1.

-f
has the same effect as lex's -f flag (do not compress the scanner
tables); the mnemonic changes from fast compilation to (take your pick) 
full table or fast scanner. The actual compilation takes longer, since flex
is I/O bound writing out the big table. This option is equivalent to -cf
(see below).

-i
instructs flex to generate a case-insensitive scanner.  The case of letters
given in the flex input patterns will be ignored, and the rules will be
matched regardless of case.  The matched text given in yytext will have the
preserved case (i.e., it will not be folded).

-r
specifies that the scanner uses the REJECT action.

-s causes the default rule (that unmatched scanner input is echoed to stdout)
to be suppressed.  If the scanner encounters input that does not match any of
its rules, it aborts with an error.  This option is useful for finding holes
in a scanner's rule set.

-v
has the same meaning as for lex (print to stderr a summary of statistics of
the generated scanner).  Many more statistics are printed, though, and the
summary spans several lines.  Most of the statistics are meaningless to the
casual flex user.

-F
specifies that the fast scanner table representation should be used. This
representation is about as fast as the full table representation (-f), and for
some sets of patterns will be considerably smaller (and for others, larger).
In general, if the pattern set contains both "keywords" and a catch-all,
"identifier" rule, such as in the set:

       "case"    return ( TOK_CASE );
       "switch"  return ( TOK_SWITCH );
       ...
       "default" return ( TOK_DEFAULT );
       [a-z]+    return ( TOK_ID );

then you're better off using the full table representation.  If only the
"identifier" rule is present and you then use a hash table or some such to
detect the keywords, you're better off using -F. This option is equivalent to
-cF (see below).

-I
instructs flex to generate an interactive scanner.  Normally, scanners
generated by flex always look ahead one character before deciding that a rule
has been matched.  At the possible cost of some scanning overhead (it's not
clear that more overhead is involved), flex will generate a scanner which only
looks ahead when needed.  Such scanners are called interactive because if you
want to write a scanner for an interactive system such as a command shell, you
will probably want the user's input to be terminated with a newline, and
without -I the user will have to type a character in addition to the newline
in order to have the newline recognized.  This leads to dreadful interactive
performance.

If all this seems to confusing, here's the general rule: if a human will
be typing in input to your scanner, use -I, otherwise don't; if you don't care
about how fast your scanners run and don't want to make any assumptions about
the input to your scanner, always use -I.

Note, -I cannot be used in conjunction with full or fast tables, i.e., the -f,
-F, -cf, or -cF flags.

-L
instructs flex to not generate #line directives (see below).

-T makes flex run in trace mode.  It will generate a lot of messages to
standard out concerning the form of the input and the resultant
non-deterministic and deterministic finite automatons.  This option is mostly
for use in maintaining flex.

-c[efmF]
controls the degree of table compression. -ce directs flex to construct
equivalence classes, i.e., sets of characters which have identical lexical
properties (for example, if the only appearance of digits in the flex input is
in the character class "[0-9]" then the digits '0', '1', ..., '9' will all be
put in the same equivalence class).

-cf
specifies that the full scanner tables should be generated - flex should not
compress the tables by taking advantages of similar transition functions for
different states.

-cF
specifies that the alternate fast scanner representation (described
above under the -F flag) should be used.

-cm
directs flex to construct meta-equivalence classes, which are sets of
equivalence classes (or characters, if equivalence classes are not being used)
that are commonly used together.

A lone -c
specifies that the scanner tables should be compressed but neither equivalence
classes nor meta-equivalence classes should be used.

The options -cf or -cF and -cm do not make sense together - there is no
opportunity for meta-equivalence classes if the table is not being compressed.
Otherwise the options may be freely mixed.

The default setting is -cem which specifies that flex should generate
equivalence classes and meta-equivalence classes.  This setting provides the
highest degree of table compression.  You can trade off faster-executing 
scanners at the cost of larger tables with the following generally being
true:

    slowest            smallest
               -cem
               -ce
               -cm
               -c
               -c{f,F}e
               -c{f,F}
    fastest            largest

-Sskeleton_file
overrides the default skeleton file from which flex constructs its scanners. 
You'll never need this option unless you are doing flex maintenance or
development.

INCOMPATIBILITIES WITH LEX
==========================
flex is fully compatible with lex with the following exceptions:
There is no run-time library to link with.  You needn't specify -ll
when linking, and you must supply a main program.  (Hacker's note: since
the lex library contains a main() which simply calls yylex(), you actually
can be lazy and not supply your own main program and link with -ll.)

lex's %r (Ratfor scanners) and %t (translation table) options are not
supported.

The do-nothing -n flag is not supported.

When definitions are expanded, flex encloses them in parentheses.
With lex, the following

    NAME    [A-Z][A-Z0-9]*
    %%
    foo{NAME}?      printf( "Found it\\n" );
    %%

will not match the string "foo" because when the macro is expanded the rule is
equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence is such that the '?' is 
associated with "[A-Z0-9]*".  With flex, the rule will be expanded to 
"foo([A-z][A-Z0-9]*)?" and so the string "foo" will match.

yymore() is not supported.

The undocumented lex-scanner internal variable yylineno is not supported.

If your input uses REJECT, you must run flex with the -r flag.  If you leave
out the flag, the scanner will abort at run-time with a message that the
scanner was compiled without the flag being specified.

The input() routine is not redefinable, though may be called to read
characters following whatever has been matched by a rule.  If input()
encounters and end-of-file the normal yywrap() processing is done.  A ``real''
end-of-file is returned as EOF.

Input can be controlled by redefining the YY_INPUT macro.
YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
action is to place up to max_size characters in the character buffer "buf"
and return in the integer variable "result" either the number of characters
read or the constant YY_NULL (0 on Unix systems) systems) to indicate EOF.
The default YY_INPUT reads from the file-pointer "yyin" (which is by default
stdin), so if you just want to change the input file, you needn't redefine
YY_INPUT - just point yyin at the input file.

A sample redefinition of YY_INPUT (in the first section of the input file):

    %{
    #undef YY_INPUT
    #define YY_INPUT(buf,result,max_size) \\
        result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
    %}

You also can add in things like counting keeping track of the input line number
this way; but don't expect your scanner to go very fast.

output() is not supported. Output from the ECHO macro is done to the
file-pointer "yyout" (default stdout).

Trailing context is restricted to patterns which have either a fixed-sized 
leading part or a fixed-sized trailing part. For example, "a*/b" and "a/b*"
are okay, but not "a*/b*". This restriction is due to a bug in the trailing
context algorithm given in Principles of Compiler Design (and Compilers - 
Principles, Techniques, and Tools) which can result in mismatches.
Try the following lex program

    %%
    x+/xy           printf( "I found \\"%s\\"\\n", yytext );

on the input "xxy".  (If anyone knows of a fast algorithm for
finding the beginning of trailing context for an arbitrary
pair of regular expressions, please let me know!)
If you must have arbitrary trailing context, you can use yyless() to effect it.

flex reads only one input file, while lex's input is made up of the
concatenation of its input files.

ENHANCEMENTS
============
Exclusive start-conditions can be declared by using %x instead of %s.
These start-conditions have the property that when they are active,
no other rules are active. Thus a set of rules governed by the same exclusive
start condition describe a scanner which is independent of any of the other
rules in the flex input.  This feature makes it easy to specify
"mini-scanners" which scan portions of the input that are syntactically
different from the rest (e.g., comments).

flex dynamically resizes its internal tables, so directives like "%a 3000"
are not needed when specifying large scanners. The scanning routine generated
by flex is declared using the macro YY_DECL. By redefining this macro you can
change the routine's name and its calling sequence.  For example, you could 
use:

    #undef YY_DECL
    #define YY_DECL float lexscan( a, b ) float a, b;

to give it the name lexscan, returning a float, and taking two floats as
arguments.

flex generates #line directives mapping lines in the output to their origin in
the input file.

You can put multiple actions on the same line, separated with semi-colons.
With lex, the following

    foo    handle_foo(); return 1;

is truncated to

    foo    handle_foo();

flex does not truncate the action.  Actions that are not enclosed in braces are
terminated at the end of the line.

Actions can be begun with
 %{
and terminated with
 %}.
In this case, flex does not count braces to figure out where the action ends -
actions are terminated by the closing
 %}.
This feature is useful when the enclosed action has extraneous braces in it
(usually in comments or inside inactive #ifdef's) that throw off the
brace-count.

All of the scanner actions (e.g., ECHO, yywrap ...) except the unput() and
input() routines, are written as macros, so they can be redefined if
necessary without requiring a separate library to link to.

FILES
=====
flex_code.skel
skeleton scanner
flex_code.fastskel
skeleton scanner for -f and -F
h.fskelcom
common definitions for skeleton files
h.fskeldef
definitions for compressed skeleton file
h.faskeldef
definitions for -f, -F skeleton file

SEE ALSO
========
lex(1) M. E. Lesk and E. Schmidt,  LEX - Lexical Analyzer Generator

AUTHOR
======
Vern Paxson, with the help of many ideas and much inspiration from
Van Jacobson.  Original version by Jef Poskanzer.  Fast table
representation is a partial implementation of a design done by Van
Jacobson.  The implementation was done by Kevin Gong and Vern Paxson.
Thanks to the many flex beta-testers, especially Casey Leedom,
Nick Christopher, Chris Faylor, Eric Goldman, Craig Leres, Mohamed el Lozy,
Esmond Pitt, Jef Poskanzer, and Dave Tallman.  Thanks to John Gilmore,
Bob Mulcahy,
Rich Salz, and Richard Stallman for help with various distribution headaches.
Send comments to:

     Vern Paxson
     Real Time Systems
     Bldg. 46A
     Lawrence Berkeley Laboratory
     1 Cyclotron Rd.
     Berkeley, CA 94720

     (415) 486-6411

     vern@lbl-{csam,rtsg}.arpa
     ucbvax!lbl-csam.arpa!vern

DIAGNOSTICS
===========
flex scanner jammed -
a scanner compiled with -s has encountered an input string which wasn't
matched by any of its rules.

flex input buffer overflowed -
a scanner rule matched a string long enough to overflow the scanner's internal
input buffer (as large as BUFSIZ in stdio.h). You can edit flexskelcom.h and
increase YY_BUF_SIZE and YY_MAX_LINE to increase this limit.

REJECT used and scanner was not generated using -r 
just like it sounds.  Your scanner uses REJECT. You must run flex on the
scanner description using the -r flag.

old-style lex command ignored - the flex input contains a lex command 
(e.g., "%n 1000") which is being ignored.

BUGS
====
Use of unput() or input() trashes the current yytext and yyleng.

Use of unput() to push back more text than was matched can
result in the pushed-back text matching a beginning-of-line ('^')
rule even though it didn't come at the beginning of the line.

Nulls are not allowed in flex inputs or in the inputs to
scanners generated by flex.  Their presence generates fatal
errors.

Do not mix trailing context with the '|' operator used to
specify that multiple rules use the same action.  That is,
avoid constructs like:

        foo/bar      |
        bletch       |
        bugprone     { ... }

They can result in subtle mismatches.  This is actually not a problem if there
is only one rule using trailing context and it is the first in the list
(so the above example will actually work okay).  The problem is due to
fall-through in the action switch statement, causing non-trailing-context rules
to execute the trailing-context code of their fellow rules.  This should be
fixed, as it's a nasty bug and not obvious.  The proper fix is for flex to spit
out a FLEX_TRAILING_CONTEXT_USED #define and then have the backup logic in a
separate table which is consulted for each rule-match, rather than as part of
the rule action.  The place to do the tweaking is in add_accept() - any kind 
soul want to be a hero?

The pattern:

       x{3}

is considered to be variable-length for the purposes of trailing context, even
though it has a clear fixed length.

Due to both buffering of input and read-ahead, you cannot intermix
calls to, for example, getchar() with flex rules and expect it to work.
Call input() instead.

The total table entries listed by the -v flag excludes the number of table
entries needed to determine what rule has been matched.  The number of entries
is equal to the number of DFA states if the scanner was not compiled with -r,
and greater than the number of states if it was.

The scanner run-time speeds have not been optimized as much as they deserve.
Van Jacobson's work shows that they can go quite a bit faster still.
